A Programmable Co-Processor for Profiling
نویسندگان
چکیده
Aggressive program optimization requires accurate profile information, but such accuracy requires many samples to be collected. We explore a novel profiling architecture that reduces the overhead of collecting each sample by including a programmable co-processor that analyzes a stream of profile samples generated by a microprocessor. From this stream of samples, the co-processor can detect correlations between instructions (e.g., memory dependence profiling) as well as those between different dynamic instances of the same instruction (e.g., value profiling). The profiler’s programmable nature allows a broad range of data to be extracted, post-processed, and formatted, as well as provides the flexibility to tailor the profiling application to the program under test. Because the co-processor is specialized for profiling, it can execute profiling applications more efficiently than a general-purpose processor. The co-processor should not significantly impact the cost or performance of the main processor because it can be implemented using a small number of transistors at the chip’s periphery. We demonstrate the proposed design through a detailed evaluation of load value profiling. Our implementation quickly and accurately estimates the value invariance of loads, with time overhead roughly proportional to the size of the instruction working set of the program. This algorithm demonstrates a number of general techniques for profiling, including: estimating the completeness of a profile, a means to focus profiling on particular instructions, management of profiling resources.
منابع مشابه
Design and Implementation of Field Programmable Gate Array Based Baseband Processor for Passive Radio Frequency Identification Tag (TECHNICAL NOTE)
In this paper, an Ultra High Frequency (UHF) base band processor for a passive tag is presented. It proposes a Radio Frequency Identification (RFID) tag digital base band architecture which is compatible with the EPC C C2/ISO18000-6B protocol. Several design approaches such as clock gating technique, clock strobe design and clock management are used. In order to reduce the area Decimal Matrix C...
متن کاملParallel programmable video co-processor design
Modern video applications call for computationally intensive data processing at very high data rate. In order to meet the high-performance/low-cost constraints, the stateof-the-art video processor should be a programmable design which performs various tasks in video applications without sacrificing the computational power and the manufacturing cost in exchange for such flexibility. In this pape...
متن کاملEmbedded Processor Selection Using FPGA-based Profiling
In embedded systems, modeling the performance of off-the-shelf processors is very important to enable the designer to estimate the capability of each candidate processor against the target application. Considering the large number of available embedded processors, the need has increased for building an infrastructure by which it is possible to estimate the performance of a given application on ...
متن کاملMassively Parallel Programmable Video Co
Modern video applications call for computationally intensive data processing at very high data rate. In order to meet the high-performance/low-cost constraints, the state-of-the-art video processor should be a programmable design which performs various tasks in video applications without sacriicing the computational power and the manufacturing cost in exchange for such exibility. In this paper,...
متن کاملAcceleration Framework using MicroBlaze Soft-core Processors on FPGAs
Offloading the complex computational kernel from the processor is the common way to improve performance of embedded system. In our work we are using MicroBlaze softcore processor in design and implementation of acceleration framework. In acceleration framework MicroBlaze is coupled with co-processor with the help of communication bus. We can attach the co-processor to our design that can handle...
متن کامل